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ABSTRACT 

A  new  motion  estimation  approach,  the  DCT-Based  Motion  Estimation  Scheme  (DXT-ME)  utilizing  the 
sinusoidal  orthogonal  principles  to  estimate  displacements  of  moving  objects  in  the  transform  domain, 
based  upon  the  concept  of  pseudo  phases,  is  presented  in  this  paper.  The  computational  complexity  of 
this  method  is  only  0(N2)  for  an  N  x  N  block  in  comparison  to  the  0(N 4)  complexity  of  Full  Search 
Block  Matching  Approach  (BMA-ME).  In  addition,  the  DXT-ME  algorithm  has  solely  highly  parallel 
local  operations  and  this  property  makes  parallel  implementation  feasible.  Furthermore,  incorporation 
of  DXT-ME  with  a  video  coder  using  DCT  can  combine  the  DCT  and  motion  estimation  algorithm 
to  achieve  further  saving  in  overall  system  complexity  and  increase  the  system  throughput.  Unlike  the 
pel-recursive  algorithm,  this  scheme  is  robust  for  even  very  noisy  images.  Due  to  its  feature  matching 
property,  we  can  employ  simple  preprocessing  on  images  of  complicated  scenery  to  extract  the  features 
of  moving  objects  for  DXT-ME  to  further  improve  its  performance.  Finally  simulation  on  a  number  of 
video  sequences  is  presented  to  compare  DXT-ME  with  BMA-ME. 

Keywords:  Motion  estimation,  video  coding,  video  compression 
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I.  Introduction 

In  recent  years,  great  interests  have  been  found  in  motion  estimation  due  to  its  various  promising 
applications  [1]  in  high  definition  television  (HDTV),  multimedia,  video  telephony,  target  detection  and 
tracking  ,  and  computer  vision,  et  al.  Extensive  research  has  been  done  over  many  years  in  developing 
new  algorithms  [1],  [2]  and  designing  cost-effective  and  massively  parallel  hardware  architectures  [3], 
[4],  [5],  [6]  suitable  for  current  VLSI  technology. 

The  most  commonly  used  motion  estimation  scheme  in  video  coding  is  the  Full  Search  Block  Matching 
Algorithm  (BMA-ME)  which  searches  for  the  best  candidate  block  among  all  the  blocks  in  a  search  area 
of  larger  size  in  terms  of  either  the  mean-square  error  [7]  or  the  mean  of  the  absolute  frame  difference 
[8].  The  computational  complexity  of  this  approach  is  very  high,  i.e.  0(1V4)  for  a  N  x  N  block.  Even 
so,  BMA-ME  has  been  successfully  implemented  on  VLSI  chips  [3],  [4],  [5].  To  reduce  the  number  of 
computations,  a  number  of  suboptimal  fast  block  matching  algorithms  have  been  proposed  [7],  [8],  [9], 
[10],  [11],  [12]  .  However,  these  algorithms  require  three  or  more  sequential  steps  to  find  suboptimal 
estimates.  Recently  a  correlation-based  approach  (CLT-ME)  [13]  using  Complex  Lapped  Transform 
(CLT)  to  avoid  the  block  effect  was  proposed  but  it  still  requires  searching  over  a  larger  search  area  and 
thus  results  in  a  very  high  computational  burden.  Moreover,  motion  estimation  using  the  CLT-ME  is 
accurate  on  moving  sharp  edges  but  not  on  blur  edges. 

In  addition  to  block-based  approaches,  pel-based  estimation  methods  such  as  Pel-Recursive  Algorithm 
(PRA-ME)  [14],  [15]  and  Optical  Flow  Approach  (OFA-ME)  [16]  ,  are  very  vulnerable  to  noise  by  virtue 
of  their  involving  only  local  operations  and  may  suffer  from  the  instability  problem.  For  multiframe 
motion  detection,  3D-FFT  has  been  successfully  used  to  estimate  motion  in  several  consecutive  frames 
[17],  [18],  based  on  the  phenomenon  that  the  spatial  and  temporal  frequencies  of  a  moving  object  lie  on  a 
plane  of  spatiotemporal  space  [19].  However,  this  requires  processing  of  several  frames  rather  than  two, 
and  the  fast  Fourier  transform  operates  on  complex  numbers  and  is  not  used  in  most  video  standards. 

In  most  international  video  coding  standards  such  as  CCITT  H.261  [20]  ,  MPEG  [21]  as  well  as  the 
proposed  HDTV  standard,  Discrete  Cosine  Transform  (DCT)  and  block-based  motion  estimation  are 
the  essential  elements  to  achieve  spatial  and  temporal  compression,  respectively.  Most  implementations 
of  a  standard-compliant  coder  adopt  the  structure  of  Coder  III  (originally  named  in  [22])  as  shown  in 
Fig.  1(a).  The  DCT  is  located  inside  the  loop  of  temporal  prediction,  which  also  includes  an  Inverse  DCT 
(IDCT)  and  a  spatial-domain  motion  estimator  (SD-ME)  which  is  usually  the  BMA-ME.  The  IDCT  is 
needed  solely  for  transforming  the  DCT  coefficients  back  to  the  spatial  domain  in  which  the  SD-ME 
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(b)  Coder  II 

Fig.  1.  Coder  structures:  (a)  Coder  III  is  the  motion-compensated  DCT  hybrid  coder  used  in  MPEG  or  H.261 
standards  with  motion  estimation  done  in  the  spatial  domain,  (b)  Coder  II  is  the  motion-compensated  DCT 
hybrid  coder  with  motion  estimation  performed  in  the  transform  domain. 

estimates  motion  vectors  and  performs  motion  compensated  prediction.  This  is  an  undesirable  coder 
architecture  for  the  following  reasons.  In  addition  to  the  additional  complexity  added  to  the  overall 
architecture,  the  DCT  and  IDCT  must  be  put  inside  the  feedback  loop  which  has  long  been  recognized 
as  the  major  bottleneck  of  the  entire  digital  video  system  for  high-end  real-time  applications.  The 
throughput  of  the  coder  is  limited  by  the  processing  speed  of  the  feedback  loop,  which  is  roughly  the 
total  time  for  the  data  stream  to  go  through  each  component  in  the  loop.  Therefore  the  DCT  (or 
IDCT)  must  be  designed  to  operate  at  least  twice  as  fast  as  the  incoming  data  stream.  A  compromise 
is  to  remove  the  loop  and  perform  open-loop  motion  estimation  based  upon  original  images  instead  of 
recontructed  images  in  sacrifice  of  the  performance  of  the  coder  [23] . 

An  alternative  solution  without  degradation  of  the  performance  is  to  develop  a  motion  estimation 
algorithm  which  can  work  in  the  DCT  transform  domain  as  remarked  in  [22],  In  this  way,  the  DCT 
can  be  moved  out  of  the  loop  as  depicted  in  Fig.  1(b)  and  thus  the  operating  speed  of  this  DCT  can  be 
reduced  to  the  data  rate  of  the  incoming  stream.  Moreover,  the  IDCT  is  removed  from  the  feedback  loop 
which  now  has  only  two  simple  components  Q  and  Q-1  (the  quantizers)  in  addition  to  the  transform- 
domain  motion  estimator  (TD-ME) .  This  not  only  reduces  the  complexity  of  the  coder  but  also  resolve 
the  bottleneck  problem  without  any  tradeoff  of  the  performance.  Furthermore,  as  pointed  out  in  [22], 
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different  components  can  be  jointly  optimized  if  they  operate  in  the  same  transform  domain. 

In  this  paper,  we  present  a  novel  algorithm  for  motion  estimation,  called  the  DCT-Based  Motion 
Estimation  (DXT-ME),  to  estimate  motion  in  the  discrete  cosine  transform  or  discrete  sine  transform 
(DCT/DST  or  DXT  for  short)  domain.  DXT-ME  is  based  on  the  principle  of  orthogonality  of  sinu¬ 
soidal  functions.  This  new  algorithm  has  certain  merits  over  conventional  methods.  It  has  very  low 
computational  complexity  (on  the  order  of  N 2  compared  to  N4  for  BMA-ME)  and  is  robust  even  in 
a  noisy  environment.  This  algorithm  takes  DCT  coefficients  of  images  as  input  to  estimate  motions 
and  therefore  can  be  incorporated  efficiently  with  the  DCT-based  coders  used  for  most  current  video 
compression  standards.  As  explained  before,  this  combination  of  both  the  DCT  and  motion  estima¬ 
tion  into  a  single  component  reduces  the  coder  complexity  and  at  the  same  time  increases  the  system 
throughput.  Finally,  due  to  the  fact  that  the  computation  of  pseudo  phases  is  inherently  highly  local 
operation,  a  highly  parallel  pipelined  architecture  for  this  algorithm  is  possible. 

In  the  next  section,  the  principles  behind  this  motion  estimation  scheme  are  presented.  In  Section  III, 
the  algorithm  of  DXT-ME  is  then  considered.  For  video  sequences  of  complicated  scenery,  some  pre¬ 
processing  is  necessary  to  further  improve  the  performance  of  this  estimator.  This  preprocessing  step 
is  discussed  in  Section  IV  along  with  a  simple  extension  of  DXT-ME  similar  to  the  decision  rule  used 
in  MPEG  standards.  Simulation  results  are  given  and  discussed  in  Section  V.  Finally,  we  conclude  the 
paper  in  Section  VI. 

II.  Sinusoidal  Orthogonal  Principles 

As  well  known,  Fourier  transform  (FT)  of  a  signal,  x(t)  is  related  to  FT  of  its  shifted  (or  delayed  if  t 
represents  time)  version,  x(t  —  r),  by  this  equation: 

F{x{t  -  r)}  =  e~joJTT{x(t)},  (1) 

where  Fi-}  denotes  Fourier  transform.  The  phase  of  Fourier  transform  of  the  shifted  signal  contains 
the  information  about  the  amount  of  the  shift  t,  which  can  easily  be  extracted.  However,  Discrete 
Cosine  Transform  (DCT)  or  its  counterpart,  Discrete  Sine  Transform  (DST),  do  not  have  any  phase 
components  as  usually  found  in  discrete  Fourier  transform  (DFT),  but  DCT  (or  DST)  coefficients 
of  a  shifted  signal  do  also  carry  this  shift  information.  To  facilitate  explanation  of  the  idea  behind 
DXT-ME,  let  us  first  consider  the  case  of  one-dimensional  discrete  signals.  Suppose  that  the  signal 
{a;i(n);  n  €  {0, . . . ,  N  —  1}}  is  right  shifted  by  an  amount  m  (in  our  convention,  a  right  shift  means 
that  m  >  0)  to  generate  another  signal  {x2 (n);  n  E  {0, . . . ,  N  —  1}}.  The  values  of  x\ (n)  are  all  zeros 
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outside  the  support  region  <S(ici).  Therefore, 


x2(n)  =  < 


x\ (n  —  m),  for  n  —  m  E  S(xi), 
0,  elsewhere. 


(2) 


(2)  implies  that  both  signals  have  resemblance  to  each  other  except  that  the  signal  is  shifted.  It  can  be 
shown  that,  for  k  =  1, . . . ,  N  —  1, 


Xf(k)  =  Zf(k)cos[^(m  +  ^)]  -  Zf(k)sm[^r{m+^)],  (3) 

X$(k)  =  Zf(k)  cos [^(m  +  ^)]  +  Zf  {k)  sin[^(m  +  |)].  (4) 


Here  X|  and  X£  are  DST  (DST-II)  and  DCT  (DCT-II)  of  the  second  kind  of  z2(n),  respectively, 
whereas  Zf  and  Zp  are  DST  (DST-I)  and  DCT  (DCT-I)  of  the  first  kind  of  xi(n),  respectively,  as 
defined  as  follows  [24]  : 

2 


.  iV_1  kir 

X2C(A;)  =  —  C{k)  ^2  X2(n)cos[— (n  +  0.5)];  k  e  {0, . . . ,  JV  —  1}, 

n= 0 
N—l 


N 

2 

N( 

2 

N( 


O  r-TT 

x$(k)  =  MC(k)  ®2(»)sin[— (n  +  0.5)];  k  e  N}, 


n— 0 
N—l 


9  JV_1  kir 

z?{k)  =  nC(k)  *i(n)cos[— (n)];  k  G  {0,...,IV}, 

n=0 


9  JV-1  kir 

zi(k)  =  -^C{k)J2xi(n)sin[—(n)}]ke{l,...,N-l}, 

n= 0 


where  C(k)  —  < 


v'/  ~i'"/  ~"*L  n 

for  k  =  0  or  iV, 
1,  otherwise, 


(5) 

(6) 

(7) 

(8) 


The  displacement,  m,  is  embedded  solely  in  the  terms  gsm(k )  =  sin[^(m+  \)\  and  g^ik)  =  cos[^(m  + 
I)],  which  are  called  pseudo  phases  analogous  to  phases  in  Fourier  transform  of  shifted  signals.  To  find 
out  m,  we  first  solve  (3)  and  (4)  for  the  pseudo  phases  and  then  use  the  sinusoidal  orthogonal  principles 
as  follows: 


9  lc'rr  1  kir  1 

—  c2(k)  sin[— (m  +  sint  iv  +  2^  =  6(m  -n)-6(m  +  n  +  !)>  (9) 

fc= 1 

9  kir  1  kir  1 

—  ^  COs[— (m  +  gM  cos[^(n  +  2^  =  ~  n)  +  +  «+!)>  (10) 

fc— 0 

Here  S(n)  is  the  discrete  impulse  function. 


Indeed,  if  we  replace  sin[^(m  +  5)]  and  cos[^(m  +  |)]  by  the  computed  sine  and  cosine  pseudo  phase 


components,  g^k)  and  g^k),  respectively  in  (9)  and  (10),  both  equations  simply  become  IDST-II  and 


KOC  AND  LIU:  DCT-BASED  MOTION  ESTIMATION 


5 


(a)  How  to  detect  right  shift  (b)  How  to  detect  left  shift 

Fig.  2.  How  the  direction  of  motion  is  determined  based  on  the  sign  of  the  peak  value  after  application  of  the 


sinusoidal  orthogonal  principle  for  the  DST-II  kernel  to  pseudo  phases 
IDCT-II  operations  on  g^ (k)  and  gm(k): 

O  N  Jar  1 

IDSTII(g*m)  =  C\k)g°m(k)  sin  [— (n  +  -)],  (11) 

2  A-7T  1 

IDCTII(g‘J  =  jf£C2(t)j;(l)co.[-(«  +  -)].  (12) 

The  notation  g  is  used  to  distinguish  the  computed  pseudo  phase  from  the  one  in  a  noiseless  situation 
(i.e.  sin[^(m  +  |)]  or  cos[^(m  +  ^)]).  A  closer  look  at  the  right-hand  side  of  (9)  tells  us  that  6(m  -  n ) 
and  5(m  +  n  +  1)  have  opposite  signs.  This  property  will  help  us  detect  the  directions  of  a  motion. 
If  we  perform  an  IDST-II  operation  on  the  pseudo  phases  found,  then  the  observable  window  of  the 
index  space  in  the  inverse  DST  domain  will  be  limited  to  {0 ,N  —  1}.  As  illustrated  in  Fig.  2,  for  a 
right-shift  motion,  one  spike  (generated  by  the  positive  8  function)  is  pointing  upwards  at  the  location 
n  =  m  in  the  gray  region  (i.e.  the  observable  index  space),  while  the  other  5  pointing  downwards 
at  n  =  —  (m  +  1)  outside  the  gray  region.  In  contrary,  for  a  left-shift  motion,  the  negative  spike  at 
n  =  —  (m  +  1)  >  0  falls  in  the  gray  region  but  the  positive  8  function  at  n  =  m  stays  out  of  the 
observable  index  space.  It  can  easily  be  seen  that  a  positive  peak  value  in  the  gray  region  implies  a 
right  shift  and  a  negative  one  means  a  left  shift.  This  enables  us  to  determine  from  the  sign  of  the  peak 
value  the  direction  of  the  movement  of  a  signal. 

The  concept  of  pseudo  phases  plus  the  application  of  the  sinusoidal  orthogonal  principles  leads  to  a 
new  approach  to  estimate  the  translational  motion  as  depicted  in  Fig.  3  (a): 

1.  Compute  the  DCT-I  and  DST-I  coefficients  of  x\  (n)  and  the  DCT-II  and  DST-II  coefficients  of 
x2(n). 
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2.  Compute  the  pseudo  phase  g^ik)  for  k  =  1, . . . ,  N  by  solving  this  equation: 

f  Zf(k)-X?(k)-Z?(k)-X?(k)  r  ,  ,  *r 

I  mw+izim*'  '  for^JV- 

for  k  =  N. 


9Sm(k)  =  { 


1 

l  s/2' 


(13) 


3.  Feed  the  computed  pseudo  phase,  {^(/c);  k  =  1, . .  .,N},  into  an  IDST-II  decoder  to  produce  an 
output  (d(n);  n  =  0, . . . ,  N  —  1},  and  search  for  the  peak  value.  Then  the  estimated  displacement 
rh  can  be  found  by 


m  =  < 


l"P', 

—  (ip  +  1), 


if  d(ip)  >  0, 
if  d(ip)  <  0, 


(14)' 


where  ip  —  arg  maxn  |d(n)|  is  the  index  at  which  the  peak  value  is  located. 

In  Step  1,  the  DCT  and  DST  can  be  generated  simultaneously  with  only  3 N  multipliers  [25],  [26],  [27], 
and  the  computation  of  DCT-I  can  be  easily  obtained  from  DCT-II  with  minimal  overhead  as  will  be 
shown  later.  In  Step  2,  if  noise  is  absent  and  there  is  only  purely  translational  motion,  gm(k)  will  be 
equal  to  sin  ^(m  +  0.5).  The  output  d(n)  will  then  be  an  impulse  function  in  the  observation  window. 
This  procedure  is  illustrated  by  two  examples  in  Fig.  3(b)  and  (c)  with  a  randomly  generated  signal  as 
input  at  SNR  =  20  dB.  From  these  two  examples,  it  is  obvious  that  the  motion  estimate  is  accurate 
even  though  strong  noise  is  present. 


III.  DCT-Based  Motion  Estimation  Algorithm  (DXT-ME) 

The  concept  of  how  to  extract  shift  values  from  the  pseudo  phases  of  one  dimensional  signals,  as 
explained  in  Section  II,  can  be  extended  to  provide  the  basis  of  a  new  DCT-based  motion  estimation 
scheme  for  two  dimensional  images,  which  we  call  DXT-ME  Motion  Estimation  Scheme.  Before  going 
into  details  of  this  new  algorithm,  let  us  confine  the  problem  of  motion  estimation  to  the  scenario  in 
which  an  object  moves  translationally  by  m\  in  X  direction  and  ni  in  Y  direction  as  viewed  on  the 
camera  plane  and  within  the  scope  of  a  camera  in  a  noiseless  environment  as  shown  in  Fig.  4.  Then  we 
can  extract  the  motion  vector  out  of  the  two  consecutive  frames  of  the  images  of  that  moving  object  by 
making  use  of  the  sinusoidal  orthogonal  principles  (9)  and  (10). 

The  algorithm  of  this  DCT-based  motion  estimation  scheme  is  depicted  in  Fig.  5.  The  previous  frame 
xt-\  and  the  current  frame  xt  are  fed  into  2D-DCT-II  and  2D-DCT-I  coders  respectively.  An  2D-DCT- 
II  coder  computes  four  coefficients,  DCCTII,  DCSTII,  DSCTII,  and  DSSTII,  each  of  which  is  defined 
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XI  olMS  64*1  SNR-10  38  X2  (SHIFTED  13  RIGHT)  XI  OltlttM  at  SNFfclOdB  X2  (SHIFTED -13  RIGHT) 


(a)  1-D  Motion  Estimation  (b)  Shift  Right  (c)  Shift  Left 

Fig.  3.  Illustration  of  the  concept  of  DCT-based  motion  estimation  in  one  dimensional  case 

(15) 

(16) 

(17) 

(18) 

or  symbolically, 

Xtcc  =  DCCTII(xt),  Xtcs  =  DC  ST  1 1  (xt),  Xtsc  =  DSCTII(xt),  X?s  =  DSSTII(xt). 


as  a  two-dimensional  separable  function  formed  by  1D-DCT/DST-II  kernels: 


4  JV_i  kir  hr 

^f(M)  =  j^C(k)C(l)  Y  *t(n», n) cos[— (m  + 0.5)] cos[—(n  + 0.5)], 

m,n= 0 

k,l  e  {0,...,N  -1}, 

4  N~1  k-K  hr 

*“(M)  =  j^C(k)C(l)  Y  xt(rn,n)cos[— (m  +  0.5)] sin[— (n  +  0.5)], 

m,  n=0 

A:  G  {0, . . . ,  JV  -  1},  /  e  {1, . . . ,  IV}, 

4  kir  lir 

xtC(k>1)  =  J{2C(k)C(l)  Y  xt(m,  n)  sin[— (m  +  0.5)]  cos[— (n  +  0.5)], 

m,n=0 

k  €  {1 ,N},  l  G  {0, . . .  ,N  —  1}, 

4  ^-1  k-K  hr 

xtsik, l)  =  jpC{k)C{l)  Y  Xt(m,n)sm[—  (m  +  0.5)]  sin[—  (n  +  0.5)], 

m,n= 0 
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Fig.  4.  An  object  moves  translationally  by  mi  in  X  direction  and  rii  in  Y  direction  as  viewed  on  the  camera 
plane. 

In  the  same  fashion,  the  two  dimensional  DCT  coefficients  of  the  first  kind  (2D-DCT-I)  are  calculated 


(a)  flowchart  (b)  structure 

Fig.  5.  Block  diagram  of  DXT-ME 


based  on  1D-DCT/DST-I  kernels: 

4  k-rr  hr 

ZZh(k,l)  =  j^C(k)C(l)  £  r»)cos[^(m)]cos[^(n)], 

171,71=0 


4  JcTT  Itt 

zt-i(kJ)  =  jpC(k)C(l)  Y,  xt-i(m,n)cos[—  (m)]sin[— (n)], 

myn— 0 

k  G  {0, ...  ,N},  l  £  {1, . . . ,  IV  —  1}, 


(19) 


(20) 
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4  N~l  kir  1-7T 

=  wC{k)C{l)  Y  *i-1(m,n)sin[^(m)]co8[^(n)],  (21) 

m,n= 0 

k  €  {1, ...  ,1V  —  1},  I  E  {0, ...  ,1V}, 

4  Ar_1  kn  hr 

Zth(k,l)  =  jpC(k)C(l)  Y  ^-iKn)sin[— (m)]sin[-(n)],  (22) 

m,n— 0 

*,i  e  i}, 

or  symbolically, 

Z?_ i  =  DCCTI(xt-i),  ZctLi  =  Ztsfx  =  Z>£CTJ(zt_i),  Z£x  =  D5ST/(xt_i). 

Similar  to  one  dimensional  case,  assuming  that  only  translational  motion  is  allowed,  one  can  derive  a 
set  of  equations  to  relate  DCT  coefficients  of  xt-i(m,n)  with  those  of  xt(m,n)  in  the  same  way  as  in 
(3)  and  (4). 

Zt-i(k,l)  ■  6(k,l)  —  xt(fc,  Z),  for  k,l  E  Af,  (23) 

where  X  =  {1, . . . ,  N  —  1}, 

Z£i(k,l)  -Z?_x{k,l)  -Z^(k,l)  Z°ix{k,l) 

Z^(k,l)  Z£x(k,l)  —Z*ii(k,l)  -Z°lx(k,l) 

Zt_i  (/c,Z)  =  ,  (24) 

^i(M)  -Zflx(k,l)  Z^(k,l)  -Z£x(k,l) 

_Z°tLx{k,l)  Z£x(k,l)  Z£x(k,l)  Z£x(k,l) 

0mfm(M)  cos  ^ (mi  +0.5)  cos  ^(ni  +0.5) 

9mim(k,l)  _  cos  ^ (mi +0.5)  sin  ^(ni +0.5) 

0m?m(M)  sin ^ (mi  +0.5)  cos  ^(ni  +0.5) 

9mini(k,l)  _  _  sin (mi  +0.5) sin lf(nx  +0.5) 

iT 

Mk,l)  =  [  X?c(k,l )  X?{h ;,/)  Xjc(k,l)  Xf(k,l)  J  •  (26) 

Here  Zj_i  ( k ,  Z )  E  _R4x4  is  the  system  matrix  of  DXT-ME  at  ( k ,  Z).  At  the  boundaries  of  each  block  in  the 
transform  domain,  the  DCT  coefficients  of  xt-i{m,n)  and  xt{m,n)  have  one  dimensional  relationship 
as  given  below: 

z^(k,i)  -Z£x(k,i)  ]  rcos^(m  +  0.5)  1  f  Xf(M)  ,  a  ,  Kt  ^ 

=  ,  k  =  0,  l  G  X,  (27) 

Z£i(M)  ££i(M)  J  [  sin ^(m  + 0.5)  J  [  Xfs(k,l ) 

z^(k,l)  -z^(k,l)  cos|^(mi+0.5)  _  Xf(M) 

Z*tx(k,  l)  Z£x(k,l)  sin|^(mi+0.5)  Af(M) 


1  =  0,  k  EX',  (28) 
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Zf-i(k,  l) 

-Z^(k,l) 

cos  ^(ni  +  0.5) 

=  (-l)mi 

xr(kj) 

,  k  =  N,le  AT, 

Zft^kJ) 

zt-i(k,l) 

sin  ^(ni  +  0.5) 

.  X?(k,l)  _ 

Zt-i(k,l) 

~zi-i(k,l) 

cos  |^(mi  +  0.5) 

=  (-l)ni 

X?(k,l) 

,  l  =  N,  k  £  M, 

Z^(k,l) 

zth(k,l) 

sin  (mi  +  0.5) 

_  Xt(k,l)  _ 

(-1  rz^(k,l)  =  X?(k,l),  k  =  0,  l  —  N, 

(— l)miZ“1(fe,Z)  =  xr(k,l),  k  =  N,l  =  0. 


(29) 


(30) 

(31) 

(32) 


In  a  two  dimensional  space,  an  object  may  move  in  four  possible  directions:  northeast  (NE:  m\  >  0, 
n\  >0),  northwest  (NW:  mi  <  0,  ni  >  0),  southeast  (SE:  m\  >  0,  ni  <  0),  and  southwest  (NW: 
mi  <  0,  n\  <  0).  As  explained  in  Section  II,  the  orthogonal  equation  for  the  DST-II  kernel  in  (9) 
can  be  applied  to  the  pseudo  phase  g^k)  to  determine  the  sign  of  m  (i.e.  the  direction  of  the  shift). 
In  order  to  detect  the  signs  of  both  mi  and  n\  (or  equivalently  the  direction  of  motion),  it  becomes 


obvious  from  the  observation  in  the  one  dimensional  case  that  it  is  necessary  to  compute  the  pseudo 
phases  g^ni  (•,  •)  and  g^fni  (•,  •)  so  that  the  signs  of  mi  and  ni  can  be  determined  from  g^ni  (-,  •)  and 
5mfm  (')  •)>  respectively.  By  taking  the  block  boundary  equations  (27)-(32)  into  consideration,  we  define 
two  pseudo  phase  functions  as  follows: 


/mini  (k,  l)  —  \ 


9m  fm  (M), 

V2  Jzfl^kW1 iWTTW*  ’ 

1  zt“  J  {k,l)X?  (k,i)+z°z  x  (fc,  1)X?  ( k,i ) 
V2  (^ImF+TOTwTP  ’ 

1  W) 

2  ’ 

1 

2’ 


for  k,l  G  {1, . . . ,  N  —  1}, 

for  k  =  0,l  £  -l}, 

for  l  =  N,k  e  {1,...,1V-  1},  (33) 

for  k  =  0,  l  =  N,  and  Z^(k,  l )  +  0, 

for  k  =  0,  l  =  N,  and  Z^l(k,  l)  =  0; 


Qm\n\  (^5  0  —  \ 


9mini(k,l ), 


(34) 


for  k,  l  G  {1, . . . ,  N  —  1}, 
for  I  =  0,  fc  €  {1, . . . ,  IV  —  1}, 
for  l  =  N,  k  €  {1, . . . ,  IV  —  1}, 
for  k  =  0,  l  =  N,  and  Zf^1(k,  l)  ^  0, 
for  k  =  0,  /  =  IV,  and  Z£x(k,  l )  =  0. 

These  two  pseudo  phase  functions  pass  through  2D-IDCT-II  coders  ( IDCSTII  and  IDSCTII)  to 
generate  two  functions,  DCS(-,-)  and  DSC(-,-)  in  view  of  the  orthogonal  property  of  DCT-II  and 
DST-II  in  (9)  and  (10): 


1  Z-1(fe,0A6-(fc,0-Z-1(fc,0Afc(*,O 
V2  (zfz^W+izfl^kW  ’ 
1  Z£1(M)*r(*.0+z£1(W)*t"(*,0 
\/2  rafWi WTF  ’ 

1  W) 

2  ZttjkJ)  ’ 

1 

2> 


DCS(m,n)  =  IDCSTII(fmini) 
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DSC(m,n) 


-jk  5Z  £tf(*)tf(0/m1ni(M)c°8^(m+  J)sin^(n  +  J) 

A:=0  1=1  iV  Z  IV  Z 

[J(m  -  mi)  +  J(m  +  mi  +  1)]  •  [<5(n  -  n{)  -  S(n  +  n\  +  1)], 
IDSCTII(gmini) 

4EE  C(k)C(l)gmini  (k,  l)  sin  ^-(m  +  cos  T?(n  + 
fc=l  i=o  *  z 

[<5(m  —  mi)  -  5(m  +  mi  +  1)]  •  [<5(n  —  ni)  +  S(n  +  m  +  1)]. 


(35) 


(36) 


DCS: 


DSC: 


•  Positive  Peak  Value  O  Negative  Peak  Value  9  Positive  Peak  Value  O  Negative  Peak  Value 

(a)  DCS  (b)  DSC 

Fig.  6.  How  the  direction  of  motion  is  determined  based  on  the  sign  of  the  peak  value 


By  the  same  argument  as  in  one  dimensional  case,  the  2D-IDCT-II  coders  limit  the  observable  index 
space  {(i,  j)  :  i,j  =  0, . . . ,  N  —  1}  of  DCS  and  DSC  to  the  first  quadrant  of  the  entire  index  space 
shown  as  gray  regions  in  Fig.  6  which  depicts  (35)  and  (36).  Similar  to  one  dimensional  case,  if 
mi  is  positive,  the  observable  peak  value  of  DSC(m,n )  will  be  positive  regardless  of  the  sign  of  ni 
since  DSC(m,n)  =  6(m  —  mi)  •  [6(n  —  ni)  +  S(n  +  ni  +  1)]  in  the  observable  index  space.  Likewise, 
if  mi  is  negative,  the  observable  peak  value  of  DSC(m,n)  will  be  negative  because  DSC{m,n)  = 
5(m  +  mi  +  1)  •  [<5(n  — ni)  +  5(n  +  ni  +  1)]  in  the  gray  region.  As  a  result,  the  sign  of  the  observable  peak 
value  of  DSC  determines  the  sign  of  mi.  The  same  reasoning  may  apply  to  DCS  in  the  determination 
of  the  sign  of  n\.  The  estimated  displacement,  d  =  (mi,ni),  can  thus  be  found  by  locating  the  peaks  of 
DCS  and  DSC  over  {0, . . . ,  N  —  l}2  or  over  an  index  range  of  interest,  usually,  $  =  {0, . . . ,  A/2}2  for 
slow  motion.  How  the  peak  signs  determine  the  direction  of  movement  is  summarized  in  Table  I.  Once 
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Sign  of 
DSC  Peak 

Sign  of 
DCS  Peak 

Peak  Index 

Direction  of 
Motion 

+ 

+ 

(mi,ni) 

northeast 

+ 

- 

(mi,-(ni  +  1)) 

southeast 

— 

+ 

(-(mi  +  l),ni) 

northwest 

- 

- 

(-(mi  +  l),-(ni  +  1)) 

southwest 

TABLE  I 

Determination  of  direction  of  movement  (mi,ni)  from  the  signs  of  DSC  and  DCS 


the  direction  is  found,  d  can  be  estimated  accordingly: 


rh\ 


n  1 


tDSC  — tDCSi 

if  DSCiiosci  3dsc)  >  0, 

(37) 

—  {t  DSC  +  1)  = 

-(iDCS  +  1), 

if  DSC(iDSCi3DSc)  <  0, 

3 dcs  = 3dsCi 

if  DCS^dcsCdcs)  >  0, 

(38) 

~(j  DCS  +  1)  = 

~{j  DSC  +  1), 

if  DCS^dcsCdcs)  <  0, 

where 


(‘ iDCS,jDCS )  =  arg  max  \DCS{m,n)\,  (39) 

ra,n£<P 

( WscJdsc )  =  arg  max  \DSC(m,n)\.  (40) 


Normally,  these  two  peak  indices  are  consistent  but  in  noisy  circumstances,  they  may  not  agree.  In  this 
case,  an  arbitration  rule  must  be  made  to  pick  the  best  index  {id -,3d)  in  terms  of  minimum  nonpeak- 
to-peak  ratio  ( NPR ): 


(*£>,  jo) 


0 idscJdsc )  if  NPR(DSC)  <  NPR(DCS), 
DC  Si  3  dcs)  if  NPR{DSC)  >NPR(DCS). 


(41) 


This  index  {idCd)  will  then  be  used  to  determine  d  by  (37)  and  (38).  Here  NPR  is  defined  as  the  ratio 
of  the  average  of  all  absolute  non-peak  values  to  the  absolute  peak  value.  Thus  0  <  NPR  <  1,  and  for 
a  pure  impulse  function,  NPR  —  0.  Such  an  approach  to  choose  the  best  index  among  the  two  indices 
is  found  empirically  to  improve  the  noise  immunity  of  this  estimator. 

In  situations  where  slow  motion  is  preferred,  it  is  better  to  search  the  peak  value  in  a  zigzag  way  as 
widely  used  in  DCT-based  hybrid  video  coding  [20]  [21]  .  Starting  from  the  index  (0,0),  zigzagly  scan 
all  the  DCS  (or  DSC)  values  and  mark  the  point  as  the  new  peak  index  if  the  value  at  that  point  (i,j) 
is  larger  than  the  current  peak  value  by  more  than  a  preset  threshold  9: 


i^DCS ) 3 dcs)  =  ( hj )  if  DCS(i,j)  >  DCS(idcs,3dcs)  +  0, 


(42) 
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{}dsc j  Jdsc)  =  {hi)  if  DSC(i,j )  >  DSC{iDSc,3DSc)  +  0.  (43) 


In  this  way,  large  spurious  spikes  at  the  higher  index  points  will  not  affect  the  performance  of  the 
estimator  and  thus  improve  its  noise  immunity  further. 


(a)  original  inputs  x\  and  (b)  noise  added  (c)  /  and  g  (d)  DSC  and  DCS 


Fig.  7.  DXT-ME  performed  on  the  images  of  an  object  moving  in  the  direction  (5,  -3)  with  additive  white 
Gaussian  noise  at  SNR  =  10  dB  in  a  completely  dark  environment 

Fig.  7  demonstrates  this  DXT-ME  algorithm.  Images  of  a  rectangularly-shaped  moving  object  with 
arbitrary  texture  are  generated  as  in  Fig.  7(a)  and  corrupted  by  additive  white  Gaussian  noise  at  SNR 
=  10  dB  as  in  Fig.  7(b).  The  resulted  pseudo  phase  functions  /  and  g,  as  well  as  DCS  and  DSC,  are 
depicted  in  Fig.  7  (c)  and  (d)  correspondingly.  Large  peaks  can  be  seen  clearly  in  Fig.  7(d)  on  rough 
surfaces  caused  by  noise  in  spite  of  noisy  input  images.  The  positions  of  these  peaks  give  us  an  accurate 
motion  estimate  (5,  —3). 

A.  Stability  of  DXT-ME  Motion  Estimator 

The  stability  of  this  motion  estimator  depends  upon  the  property  of  the  determinant  of  the  system 
matrix  \Zt-i{k,l)\  in  (23).  A  zero  or  near-zero  value  of  \Zt-i{k,l)\  may  jeopardize  the  performance. 
However,  it  can  be  shown  analytically  that  this  determinant  will  rarely  be  zero.  Some  algebraic  manupi- 
lations  on  the  determinant  of  Zt-i{k,l)  give  us  this  close  form  of  |Zt_i(A:,I)|: 

Z£r(M)  -Zfh{k,l)  -Z&ikJ)  Zfl.ikJ) 

Zfhik,  l)  Z?_x{k,l)  -Z^{k,l)  -Z£X(M) 

Zf-i(k,  l)  -Z^{k,l)  Z?_x{k,l)  -Zf^ikJ) 

ZthiKl)  Z&ik'l)  Zft\{k,  l)  ZfliikJ) 


zt-i(M)| 
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=  (Z™x(kJ)2  -  Zstlx{k,l)2)2  +  (Z£x(k,l)2  -  Z?ix(k,l)2)2 
+2  (Z?_x(kJ)Z?tx(kJ)  +  Z£x(k,l)Z?ix(k,l))2 
+2(Zflx(k,l)Z°lx(k,l)  +  Z™x(k,l)Z£x(k,l))2  (44) 

in  which  |Zj_i(A;, l)\  =  0  implies  that 

ZtxiKl)  =  ±Z£l(k,l),  Z£x(k,l)  =  ±Zfix(k,l), 

Z£x{k,l)Z£x{k,l)  =  -Zff_x  (k,  l)Zfix  (k,  l),  Z£ix(k,l)Zflx(k,l)  =  -Z?ix(k,l)Zfix{k,l). 

However,  the  last  two  equalities  allow  only  two  possible  conditions: 

either  Zfix(k,l)  =  Zflx(k,l),  Zflx{k,l)  =  -Z£x(k,l),  (45) 

or  Z£x(k,l)  = -Zflx(k,l),  Z£x(k,l)  =  Z}ix{k,l).  (46) 

Alternatively,  in  their  explicit  compact  forms, 

N-l 

Yj  xt-\{m,n)sm[—(kmTln)]  =  0,  (47) 

m,n— 0 

N-l  ^ 

53  Xt-i(m,n)cos[—(kmTln)]  =  0.  (48) 

m,n— 0 

Here  the  minus  signs  in  (47)  and  (48)  correspond  to  the  first  condition  in  (45)  and  the  plus  signs 
correspond  to  (46).  Satisfying  either  condition  requires  xt-\ (m,n)  =  0.  Therefore  it  is  very  unlikely 
that  this  determinant  is  zero  and  as  such,  the  DXT-ME  is  a  stable  estimator.  Even  so,  if  Zt-X(k,  l)  =  0 
really  happens  or  Z<_i (k,l)  is  less  than  a  threshold,  then  we  can  let  f(k,l)  =  g(k,l )  =  1,  which  is 
equivalent  to  the  situation  when  xt-i(m,n )  =  0.  In  this  way,  the  catastrophical  effect  of  computational 
precision  of  a  certain  implementation  on  the  stability  of  DXT-ME  will  be  kept  to  minimum  or  even 
eliminated. 

B.  Motion  Estimation  In  A  Uniformly  Bright  Background 

What  if  an  object  is  moving  in  a  uniformly  bright  background  instead  of  a  completely  dark  environ¬ 
ment?  It  can  be  shown  analytically  and  empirically  that  uniformly  bright  background  introduces  only 
very  small  spikes  which  does  not  affect  the  accuracy  of  the  estimate.  Suppose  that  {xt-i(m,  n)}  and 
{xt(m,n)}  are  pixel  values  of  2  consecutive  frames  of  an  object  displaced  by  (mi, raj.)  on  a  uniformly 
bright  background.  Then  let  yt(m,n)  and  n)  be  the  pixel  value  of  xt{m,n)  and  xt-i{m,n) 

subtracted  by  the  background  pixel  value  c  (c  >  0)  respectively: 


yt(m,n)  =  xt{m,n)  -  c, 


(49) 
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yt-i{m,n)  =  xt-i(m,n)  -  c.  (50) 

In  this  way,  {xt-i(m,  n)}  and  {2^ (to,  n)}  can  be  considered  as  the  images  of  an  object  moving  in  a  dark 
environment.  Denote  l)  as  the  system  matrix  of  the  input  image  xt-\  and  Uj_i (k,l)  as  that  of 

yt- 1  for  k,l  e  J\f.  Also  let  xt{k,l)  be  the  vector  of  the  2D-DCT-II  coefficients  of  xt  and  y t(k,l)  be  the 
vector  for  yt.  Applying  the  DXT-ME  algorithm  to  both  situations,  we  have,  for  k,l  eAf, 

Zt— i(A;, l)  ■  0m ini ( k , l)  =  x^(A;, /),  (51) 

Vt-i(k,l)  ■  ^miniik,  l)  =  yt(k,l).  (52) 

Here  (f)mini  ( k ,  l)  is  the  vector  of  the  computed  pseudo  phases  for  the  case  of  dark  background  and  thus 

—  bmfni^iOi  9mfni(^>0)  9m^n 9mwi(kJ)]T 
but  emindkj)  is  for  uniformly  bright  background  and 

0mini(k,l)  =  [g^ni(k,l),  g^Sini(k,l),  (k,l),  0mfm(M)]T  ^  $mini(k,l). 

Starting  from  the  definition  of  each  element  in  Z*_i  (k,l)  and  Xt(k,l),  we  obtain 

Zt_i(M)  =  Vt-i(k,l)  +  c-D(k,l),  (53) 

xt{k,l)  =  yt(k,l)  +  c- c(k,l),  (54) 

where  D(A;, /)  is  the  system  matrix  with  {d(m,n)  =  1,  Vto,  n  =  {0, . . .  ,N  —  1}}  as  input  and  c(k,  l)  is 
the  vector  of  the  2D-DCT-II  coefficients  of  d(m,n).  Substituting  (53)  and  (54)  into  (52),  we  get 

Zt— 1  (k,  l)  ■  9m ini  (A:,  Z)  =  Zf_i(A;,  l)  ■  4>m\m (A5  V)  +  c  ■  [c(A;,  Z)  —  D(k,l)  ■  4>mini (&,  Z)].  (55) 

Since  c (k,l)  —  D (A;,/)  •  <j)oo(k,l),  (55)  becomes 

9m\ni  ( k ,  Z)  =  (f>mini  {k,l)  +  cZt_1(k,l)’D(k,l)[cf)oo(k,l)  <f>mini(k,l)],  (56) 


provided  that  \Zt-i(k,l)\  7^  0.  Similar  results  can  also  be  found  at  block  boundaries.  Referring  to  (24), 
we  know  that  D(A:,Z)  is  composed  of  Dcc(k,l),  Dcs(k,l),  Dsc(k,l),  and  Dss(k,l),  each  of  which  is  a 
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Prom  the  above  equations,  we  can  see  that  Dc(k )  =  Ds(k)  =  0  if  k  is  even,  and  for  odd  k  >  0, 
Dc{k )  =  %  while  Ds(k)  =  Hence,  Dcc{k,l)  =  Dcs(k,l )  =  Dsc(k,l)  =  Dss(k,l)  =  0  if  either 

k  or  l  is  even.  As  a  result,  dmini  (M)  077ii7ii  0  if  either  k  or  l  is  even.  For  odd  indices  k  cind  1 ,  it 
is  possible  to  find  a  constant  s  and  a  matrix  N(/c,Z)  E  i?4x4  such  that  Ut-i  (k,l)=s[D{k,l)-N(k,l)} 
and  |N(fc,f)D~1(fc,I)|  <  1  for  |D(A;,/)|  ^  0.  Therefore,  for  \-^'N(k,l)'D~1(k,l)\  <  1, 

cZ;_\(k,l)B(k,l)  =  — [I - N(fc,Z)D_1(A:,/)]~1  (57) 

s  c  S  *i  c 

=  J^{I+ j^N(fc,i)D-1(i,()  +  [^N(k,l)D-1(t,i)]2  +  ...}.  (58) 

If  we  lump  all  the  high-order  terms  of  ~N(k,l)B~1(k,l)  in  one  term  H (k,l)  ,  then 

Omim  (k, 0  =  $mmi (k,  l)  +  +  H(fc,  l)][$0o(k,  0  -  4im  (&,  0]-  (59) 

S  —p  C 

Usually,  0  <  c,  s  <  255  for  the  maximum  gray  level  equal  to  255.  Typically  s  =  1.  For  moderately  large 

c,  H(A;,  l)  is  very  small.  Define  the  subsampled  version  of  the  pseudo-phase  function  $ab(k,l)  as 

— * 

-  <l>ab{k,l),  if  both  k  and  l  are  odd, 

^■ab{k,  l)  =  <  (60) 

0,  otherwise  . 


Then 


@m\n\(kil)  —  (/’mini  {k,  l)  +  [  +  H(fc,  f)]{Aoo  ^mmi}-  (01) 

S  i  c 

Recall  that  a  2D-IDCT-II  operation  on  </>mini  (k,  l )  or  4>oo(k,  l )  produces  8mi ni  or  Aoo ,  respectively,  where 

(8(m  —  a)  +  8(m  +  a  -I-  l))(6(n  —  6)  +  8{n  +  6+1)) 

-  (8(m  —  a)  +  8(m  +  a  +  l))(<5(n  —  6)  —  8(n  +  6  +  1)) 

6ab(m,n)  = 

(8(m  —  a)  —  8(m  +  a  +  l))(d(n  —  6)  +  8(n  +  6+1)) 

(8(m  —  a)  —  8(m  +  a  +  l))(5(n  —  6)  —  8(n  +  6+1)) 

Therefore, 

d(rn,n)  =  2D-DCT-II{4mi }  =  8m\n\  n)  +  ,  2D-DCT-II{Aoo  Amini }  +  ri(m,  77.),  (62) 

S  “I  c 

where  n  is  the  noise  term  contributed  from  2D-DCT-II{H(fc,  /)[A0o  -  Ami?n]}.  Because  Xab  is  equiv¬ 
alent  to  downsampling  (f)ab  in  a  2D  index  space  and  it  is  known  that  downsampling  produces  in  the 
transform  domain  mirror  images  of  magnitude  only  one-fourth  of  the  original  and  of  sign  depending  on 
the  transform  function,  we  obtain 

Emini (m,n)  =  2D-DCT-II{Amini }  =  —  [6mini(m, n)  +  diag(£i)  •  8^_i_mi^ni(m,n)  (63) 

+diag(^2)  ^toi (N— l— m) +  diag(£j)  •  n)], 
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where  diag(-)  is  the  diagonal  matrix  of  a  vector  and  £  ( i  =  1, 2, 3)  is  a  vector  consisting  of  ±1.  A  similar 
expression  can  also  be  established  for  2D-DCT-II{Aoo}.  In  conclusion, 

d(m,  n)  =  Smini  (m,  n)  +  ■  +  [E00(m,  n)  -  Emini  (m,  n)]  +  n(m, n).  (64) 

The  above  equation  predicts  the  presence  of  a  very  small  noise  term  n  and  several  small  spikes,  E0o 
and  Emini,  of  magnitude  moderated  by  ^sc+c^  which  are  much  smaller  than  the  displacement  peak,  as 
displayed  in  Fig.  8  (b)  and  (c)  where  n  for  the  case  of  c  —  3  in  (b)  is  observable  but  very  small  and  can 
be  regarded  as  noise  whereas  n  is  practically  absent  as  in  (c)  when  c  =  255. 


(a)  /  and  g 


(b)  DSC  and  DCS 


(c)  another  DSC  and  DCS 


Fig.  8.  (a)(b)  An  object  is  moving  in  the  direction  (5,  -3)  in  a  uniformly  bright  background  (c  =  3).  (c)  Another 
object  is  moving  northeast  (8,7)  for  background  pixel  values  =  c  =  255. 


C.  Computational  Issues  and  Complexity 

The  block  diagram  in  Fig.  5(a)  shows  that  a  separate  2D-DCT-I  is  needed  in  addition  to  the  standard 
DCT  (2D-DCT-II).  This  is  undesirable  from  the  complexity  viewpoint.  However,  this  problem  can 
be  circumvented  by  considering  the  point-to-point  relationship  between  2D-DCT-I  and  2D-DCT-II 
coefficients  in  the  frequency  domain  for  k,  l  E  M: 

^t-l(M)  cos  cos  ^  cos  |f  sin  sin|fcos|f  sin|fsin|f  X^(k,l) 

Zt-i(k,l)  _  —  cos  sin  cos  |f  cos  -sin^sin^  sin|fcos|f 

Zt-i(k,l)  —  sin  |f  cos  -  sin  |f  sin  cos  cos  cos  |f  sin  -X*-i(M) 

Zt-i(k,l)  _  sin  sin  ^  -sinlfcos^  -cos|fsin^  cos  Jf  cos  _  _  X^{k,l) 

where  X“1,  X^  l ,  and  are  the  2D-DCT-II  coefficients  of  the  previous  frame.  Similar  relation 

also  exists  for  the  coefficients  at  block  boundaries.  This  observation  results  in  the  simple  structure  in 
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Fig.  5(b),  where  Block  T  is  a  coefficient  transformation  unit  realizing  (65). 


Stage 

Component 

Computational  Complexity 

1 

2D-DCT-II 

Odd  =  0(N) 

Coeff.  Transformation  Unit  (T) 

Odd  =  0(N2) 

2 

Pseudo  Phase  Computation 

0(N 2) 

3 

2D-IDCT-II 

Odd  =  O(N) 

4 

Peak  Searching 

0(N'2) 

Estimation 

0(1) 

TABLE  II 

Computational  complexity  of  each  stage  in  DXT-ME 


If  the  DCT  has  computational  complexity  Odd >  the  overall  complexity  of  DXT-ME  is  0(N2)  +  Odd 
with  the  complexity  of  each  component  summarized  in  Table  II.  The  computational  complexity  of  the 
pseudo  phase  computation  component  is  only  0(N 2)  for  an  N  x  N  block  and  so  is  the  unit  to  determine 
the  displacement.  For  the  computation  of  the  pseudo  phase  functions  /(-,-)  in  (33)  and  <?(•,•)  in  (34), 
DSCT,  DOST  and  DSST  coefficients  (regarded  as  DST  coefficients)  must  be  calculated  in  addition  to 
DCCT  coefficients  (i.e.  the  usual  2D  DCT).  However  all  these  coefficients  can  be  generated  with  little 
overhead  in  the  course  of  computing  2D  DCT  coefficients.  As  a  matter  of  fact,  a  parallel  and  fully- 
pipelined  2D  DCT  lattice  structure  has  been  developed  [25],  [26],  [27]  to  generate  2D  DCT  coefficients 
at  a  cost  of  O(N)  operations.  This  DCT  coder  computes  DCT  and  DST  coefficients  dually  due  to  its 
internal  lattice  architecture.  These  internally  generated  DST  coefficients  can  be  output  to  the  DXT-ME 
module  for  pseudo  phase  computation.  This  same  lattice  structure  can  also  be  modified  as  a  2D  IDCT 
which  also  has  O(N)  complexity.  To  sum  up,  the  computational  complexity  of  this  DXT-ME  is  only 
0(N2),  much  lower  than  the  0(iV4)  complexity  of  BMA-ME. 

A  closer  look  at  (33),  (34)  and  (65)  reveals  that  the  operations  of  pseudo  phase  computation  and 
coefficient  transformation  are  performed  independently  at  each  point  (k,l)  in  the  transform  domain 
and  therefore  are  inherently  highly  parallel  operations.  Since  most  of  the  operations  in  the  DXT-ME 
algorithm  involve  mainly  pseudo  phase  computation  and  coefficient  transformation  in  addition  to  DCT 
and  Inverse  DCT  operations  which  have  been  studied  extensively,  the  DXT-ME  algorithm  can  easily  be 
implemented  on  highly  parallel  array  processors  or  dedicated  circuits.  This  is  very  different  from  BMA- 
ME  which  requires  shifting  of  pixels  and  summation  of  differences  of  pixel  values  and  hence  discourages 
parallel  implementation. 
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IV.  Preprocessing  Steps  and  Overlapping  Approach 

For  complicated  video  sequences  in  which  objects  may  move  across  the  border  of  blocks  in  non-uniform 
background,  preprocessing  can  be  employed  to  enhance  the  features  of  motion  objects  and  avoid  violation 
of  the  assumption  made  for  DXT-ME  before  feeding  the  images  into  the  DXT-ME  motion  estimator. 
Intuitively  speaking,  the  DXT-ME  algorithm  tries  to  match  the  features  of  any  object  on  two  consecutive 
frames  so  that  any  translation  motion  can  be  estimated  regardless  of  the  shape  and  texture  of  the  object 
as  long  as  these  two  frames  contain  significant  energy  level  of  the  object  features.  Due  to  this  feature 
matching  property  of  the  DXT-ME  algorithm,  effective  preprocessing  will  improve  the  performance  of 
motion  estimation  if  preprocessing  can  enhance  the  object  features  in  the  original  sequence.  In  order 
to  keep  the  computational  complexity  of  the  overall  motion  estimator  low,  the  chosen  preprocessing 
function  must  be  simple  but  effective  in  the  sense  that  unwanted  features  will  not  affect  the  accuracy  of 
estimation.  Our  study  found  that  both  edge  extraction  and  frame  differentiation  are  simple  and  efective 
schemes  for  extraction  of  motion  information. 

Edges  of  an  object  can  represent  the  object  itself  in  motion  estimation  as  its  features  [28]  and  contain 
the  information  of  motion  without  violating  the  assumption  for  DXT-ME.  The  other  advantage  of  edge 
extraction  is  that  any  change  in  the  illumination  condition  does  not  alter  the  edge  information  and  in 
turn  makes  no  false  motion  estimates  by  the  DXT-ME  algorithm.  Since  we  only  intend  to  extract  the 
main  features  of  moving  objects  while  keeping  the  overall  complexity  low,  we  employ  a  very  simple  edge 
detection  by  convolving  horizontal  and  vertical  Sobel  operators  of  size  3x3 

1  0  -1 

Hs  =  2  0-2  ,Va  =  Hj  (66) 

1  0  -1 

with  the  image  to  obtain  horizontal  and  vertical  gradients  respectively  and  then  combine  both  gradients 
by  taking  the  square  root  of  the  sum  of  the  squares  of  both  gradients  [29]  .  Edge  detection  provides 
us  the  features  of  moving  objects  but  also  the  features  of  the  background  (stationary  objects)  which 
is  undesirable.  However,  if  the  features  of  the  background  have  smaller  energy  than  those  of  moving 
objects  within  every  block  containing  moving  objects,  then  the  background  features  will  not  affect  the 
performance  of  DXT-ME.  The  computational  complexity  of  this  preprocessing  step  is  only  0(N 2)  and 
thus  the  overall  computational  complexity  is  still  0(N 2). 

Frame  differentiation  generates  an  image  of  the  difference  of  two  consecutive  frames.  This  frame 
differentiated  image  contains  no  background  objects  but  the  difference  of  moving  objects  between  two 
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frames.  The  DXT-ME  estimator  operates  directly  on  this  frame  differentiated  sequence  to  predict 
motion  in  the  original  sequence.  The  estimate  will  be  good  if  the  moving  objects  are  moving  constantly 
in  one  direction  in  three  consecutive  frames.  For  30  frames  per  second,  the  standard  NTSC  frame  rate, 
objects  can  usually  be  viewed  as  moving  at  a  constant  speed  in  three  consecutive  frames.  Obviously, 
this  step  also  has  only  0(N 2)  computational  complexity. 

x,.,  xt 


estimated  displacement  (ml,nl)  estimated  displacement  (ml.nl) 

(a)  E- DXT-ME  (b)  SE-DXT-ME 

Fig.  9.  Block  Diagrams  of  Extended  DXT-ME  Estimator  (E-DXT-ME)  and  Simplified  Extended  DXT-ME 
(SE-DXT-ME) 

Alternatively,  instead  of  using  only  one  preprocessing  function,  we  can  employ  several  simple  difference 
operators  in  the  preprocessing  step  to  extract  features  of  images  as  shown  in  Fig.  9(a),  in  which  four 
DXT-ME  estimators  generate  four  candidate  estimates  of  which  one  can  be  chosen  as  the  final  estimated 
displacement  based  upon  either  the  mean  squared  error  per  pixel  (MSE)  [7]  or  the  mean  of  absolute 
differences  per  pixel  (MAD)  criteria  [8]. 

Preferably,  a  simple  decision  rule  similar  to  the  one  used  in  the  MPEG-1  standard  [21]  ,  as  depicted 
in  Fig.  9(b),  is  used  to  choose  among  the  DXT-ME  estimate  and  no  motion.  This  simplified  extended 
DXT-ME  motion  estimator  works  very  well  as  will  be  shown  in  the  next  section. 

In  Section  III,  we  mention  that  peaks  of  DSC  and  DCS  are  searched  over  a  fixed  index  range  of 
interest  $  =  {0, . . . ,  1V/2}2.  However,  if  we  follow  the  partitioning  approach  used  in  BMA-ME,  then 
we  may  dynamically  adjust  <f>.  At  first,  partition  the  whole  current  frame  into  bs  x  bs  nonoverlapping 
reference  blocks  shown  as  the  shaded  area  in  Fig.  10(a).  Each  reference  block  is  associated  with  a  larger 
search  area  (of  size  sa)  in  the  previous  frame  (the  dotted  region  in  the  same  figure)  in  the  same  way 
as  for  BMA-ME.  From  the  position  of  a  reference  block  and  its  associated  search  area,  a  search  range 
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(a)  reference  block  and  search  area  (**)  search  range 


(c)  DSC/DCS 


Fig.  10.  Overlapping  approach 


V  =  {(u,u)  :  —u\  <  u  <  U2,  —v\  <  v  <  V2)  can  then  be  determined  as  in  Fig.  10(b).  Differing  from 
BMA-ME,  DXT-ME  requires  that  the  reference  block  size  and  the  search  area  size  must  be  equal.  Thus, 
instead  of  using  the  reference  block,  we  use  the  block  of  the  same  size  and  position  in  the  current  frame 
as  the  search  area  of  the  previous  frame.  The  peak  values  of  DSC  and  DCS  are  searched  in  a  zigzag  way 
as  described  in  Section  III  over  this  index  range  $  =  {0, . . . ,  max(u2, «i  —  1)}  x  {0, . . . ,  max(u2,  v\  —  1)}. 
In  addition  to  the  requirement  that  the  new  peak  value  must  be  larger  than  the  current  peak  value  by 
a  preset  threshold,  it  is  necessary  to  examine  if  the  motion  estimate  determined  by  the  new  peak  index 
lies  in  the  search  region  V.  Since  search  areas  overlap  on  one  another,  the  SE-DXT-ME  architecture 
utilizing  this  approach  is  called  Overlapping  SE-DXT-ME. 


V.  Simulation  Results 


Simulations  have  been  performed  on  a  number  of  video  sequences  with  different  characteristics.  The 
performance  of  the  DXT-ME  scheme  is  evaluated  in  terms  of  MSE  (mean  squared  error  per  pel)  defined 
as  MSE  =  ^  ’N2 — — — -,  and  compared  with  that  of  the  full  search  block  matching  method 

(BMA-ME),  which  minimizes  the  MSE  function  over  the  whole  search  area: 


4  .v  .  Em,i.N(m,n)  -  Xi{m  -  u,n  -  v)}2 

a  =  (u,  v)  —  arg  mm - : - —x - . 

UyV 


The  MSE  values  of  two  consecutive  frames  without  motion  compensation  (DIF)  are  also  computed  for 
comparison.  The  MSE  value  for  DXT-ME  is  expected  to  be  upper  bounded  by  the  MSE  value  without 
motion  compensation. 

To  test  the  performance  of  DXT-ME  on  noisy  images,  an  image  of  a  small  car  (SCAR_1)  is  manually 
shifted  to  produce  the  second  frame  (SCAR_2)  with  a  known  displacement  and  additive  Gaussian  noise 
is  added  to  attain  a  desired  signal-to-ratio  (SNR)  level.  Since  the  object  (small  car)  moves  within  the 
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(a)  Noisy  SCAR  with  Displacement=(5,-3) 


{b)  Noisy  SCAR  with  Displacement^, -3) 


20  40  60  20  40  60 

SCAR_1  at  SNR  =  1 0  dB  SCAR_2  at  SNR  =  10  dB 

Estimated  Displacement^, -3)  by  DXT  Motion  Estimation 


Displaced  Frame  Difference  Histogram  of  DCS/DSC 

Block  Matching  on  Noisy  SCAR  with  bs  =  1 6pels,  sa  =  32pels  Block  Matching  on  Noisy  SCAR  with  bs  =  1 6pel$,  sa  =  32pels 


20  40  60 

SCAR_1  at  SNR  =  0  dB 

Estimated  Displacement^  ,-3)  by 


20  40  60 

Displaced  Frame  Difference 


DXT  Motion  Estimation 


Histogram  of  DCS/DSC 


20  40  60 

SCAR_2  at  SNR  =  0  dB 


20  40 
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20  40  60 

Estimated  Displacements 


Block  Matching  on  Noisy  SCAR  with  bs  =  1 6pels,  sa  =  24pels 


Block  Matching  on  Noisy  SCAR  with  bs  =  16pets,  sa  =  24pels 


20  40  60  20  40  60  20  40  60  20  40  60 

Displaced  Frame  Difference  Estimated  Displacements  Displaced  Frame  Difference  Estimated  Displacements 


Fig.  11.  Comparison  of  DXT-ME  of  size  64  by  64  pels  with  Full-Search  Block  Matching  Method  (BMA-ME)  of 
block  size  (bs  =  16  pels)  but  different  search  areas  (sa  =  32  or  24  pels)  on  a  noisy  small  car  (SCAR)  with 
(a)  SNR  =  10  dB,  (b)  SNR  =  0  dB. 


boundary  of  the  frame  in  a  completely  darken  background,  no  preprocessing  is  required.  As  can  be  seen 
in  Fig.  11,  DXT-ME  is  performed  on  the  whole  image  of  block  size  64  x  64  and  estimates  the  motion 
correctly  at  SNR  level  even  down  to  0  dB,  whereas  BMA-ME  produces  some  wrong  motion  estimates 
for  boundary  blocks  and  blocks  of  low  signal  energy.  The  values  of  MAD  also  indicate  better  overall 
performance  of  DXT-ME  over  BMA-ME  for  this  sequence.  Furthermore,  DXT-ME  can  perform  on 
the  whole  frame  while  BMA-ME  needs  division  of  the  frame  into  sub-blocks  due  to  the  requirement  of 
larger  search  areas  than  reference  blocks.  This  is  one  of  the  reasons  that  BMA-ME  does  not  work  so 
well  as  DXT-ME  because  smaller  block  size  makes  BMA-ME  more  susceptible  to  noise  and  operation 
of  DXT-ME  on  the  whole  frame  instead  of  on  smaller  blocks  lends  itself  to  better  noise  immunity.  Even 
though  the  Kalman  filtering  approach  [30]  can  also  estimate  velocity  accurately  for  a  sequence  of  noisy 
images,  it  requires  iterative  complicated  computations  while  DXT-ME  can  estimate  motion  based  upon 
two  consecutive  frames  in  one  step,  requiring  low-complexity  computations. 
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(a)  Original  (b)  Edge  extracted  (c)  Frame  differentiated 

Fig.  12.  Frame  57  in  the  sequence  “Flower  Garden”  (FG) 


The  first  sequence  is  the  “Flower  Garden”  (FG)  sequence  where  the  camera  is  moving  before  a  big 
tree  and  a  flower  garden  in  front  of  a  house.  Each  frame  has  720  x  486  pixels.  Simple  preprocessing  is 
applied  to  this  sequence:  edge  extraction  or  frame  differentiation.  Simulation  of  SE-DXT-ME  of  block 
size  8,  16,  32,  as  well  as  64  was  performed  on  100  frames  of  this  sequence.  The  results  were  compared 
with  those  of  BMA-ME  of  block  size  8,  16,  32  with  the  search  area  being  twice  as  large  as  the  block 
size  as  shown  in  Fig.  13. 

As  can  be  seen  in  Fig.  12(b),  the  edge  extracted  frames  contain  significant  features  of  moving  objects 
in  the  original  frames  so  that  DXT-ME  can  estimate  the  movement  of  the  objects  based  upon  the 
information  provided  by  the  edge  extracted  frames.  Because  the  camera  is  moving  at  a  constant  speed 
in  one  direction,  the  moving  objects  occupy  almost  the  whole  scene.  Therefore,  the  background  features 
do  not  interfere  with  the  operation  of  DXT-ME.  The  frame  differentiated  images  of  the  “Flower  Garden” 
sequence  ,  one  of  which  is  shown  in  Fig.  12(c),  have  the  residual  energy  strong  enough  for  DXT-ME  to 
estimate  the  motion  directly  on  this  frame  differentiated  sequence  due  to  the  constant  movement  of  the 
camera. 

Observable  in  Fig.  13,  a  large  reference  block  hampers  the  performance  of  BMA-ME  indicated  by 
high  MSE  values  whereas  increasing  block  size  can  boost  up  the  performance  of  DXT-ME  with  smaller 
MSE  values.  The  reason  is  that  a  block  of  larger  size  for  DXT-ME  contains  more  features  of  objects, 
which  enables  DXT-ME  to  find  a  better  estimate  due  to  its  feature  matching  property,  and  also  a  block 
of  larger  size  means  a  larger  search  area  because  the  size  of  a  search  area  is  the  same  as  the  block 
size  for  DXT-ME.  As  a  matter  of  fact,  BMA-ME  is  supposed  to  perform  better  than  DXT-ME  if  both 
methods  use  the  same  block  size  because  BMA-ME  requires  a  larger  search  area  and  thus  BMA-ME 
has  more  information  available  before  processing  than  DXT-ME.  Therefore,  it  is  not  fair  to  compare 
BMA-ME  with  DXT-ME  for  the  same  block  size.  Instead,  it  is  more  reasonable  to  compare  BMA-ME 
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Fig.  13.  Comparison  of  SE-DXT-ME  with  BMA-ME  on  “Flower  Garden” 


with  DXT-ME  of  the  block  size  equal  to  the  size  of  the  search  area  of  BMA-ME.  As  shown  in  Fig.  13, 
the  MSE  values  for  DXT-ME  of  block  size  32  (DXT32)  preprocessed  by  either  edge  extraction  or  frame 
differentiation  are  comparable  with  or  even  better  than  those  for  BMA-ME  of  block  size  32  and  search 
size  64. 

If  the  sequence  is  shrinked  to  half  size  (352x224  pixels)  and  forms  the  small  “Flower  Garden”  sequence, 
then  each  block  will  capture  more  features  than  a  block  of  the  same  size  in  the  original  “Flower  Garden” 
sequence.  The  simulation  results  are  plotted  in  Fig.  14  for  SE-DXT-ME  and  Overlapping  SE-DXT-ME 
discussed  in  Section  IV.  As  expected,  the  MSE  values  for  SE-DXT-ME  of  block  size  16  now  become 
as  small  as  those  for  BMA-ME  of  block  size  32  as  shown  in  Fig.  14(a).  Some  points  of  the  curve  for 
SE-DXT-ME  of  block  size  32  (DXT32)  with  either  preprocessing  step  are  below  the  points  of  BMA-ME 
of  search  size  32  (BKM16:32).  If  the  overlapping  approach  is  adopted  in  determining  the  search  region, 
Fig.  14(b)  suggests  that  both  Overlapping  SE-DXT-ME  and  BMA-ME  have  comparable  performance 
for  either  preprocessing  method.  The  fact  that  smaller  frame  size  of  the  same  contents  improves  the 
performance  of  DXT-ME  in  terms  of  smaller  MSE  values  recommends  the  hierachical  motion  estimation 
technique  combined  with  DXT-ME  (SE-DXT-ME  or  Overlapping  SE-DXT-ME). 

Another  simulation  is  done  on  the  “Infrared  Car”  sequence  which  has  the  frame  size  192  x  224  and  one 
obvious  moving  object,  the  car  moving  along  the  curved  road  towards  the  camera  fixed  on  the  ground. 
In  Fig.  15(b),  the  features  of  both  the  car  and  the  background  are  captured  in  the  edge  extracted 
frames.  Even  though  the  background  features  are  not  desirable,  the  simulation  for  SE-DXT-ME  of 
various  block  sizes  shows  that  the  estimates  of  SE-DXT-ME  produce  low  MSE  values  compared  to  the 
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(a)  Comparison  of  SE-DXT-ME  with  BMA-ME  on  small  “Flower  Garden” 

SMALL  FLOWER  GARDEN  SMALL  FLOWER  GARDEN 

Overlapping  SE-DXT-ME  after  edge  detection  Overlapping  SE-DXT-ME  after  frame  differentiation 


(b)  Comparison  of  Overlapping  SE-DXT-ME  with  BMA-ME  on  small  “Flower  Garden” 

Fig.  14.  Simulation  on  small  “Flower  Garden” 

result  of  BMA-ME,  especially  in  certain  frames  such  as  the  6th  to  13th  frames  shown  in  Fig.  16(a).  This 
can  be  explained  by  taking  a  closer  look  at  the  edge  extracted  frame  in  Fig.  15(b)  where  the  background 
features,  such  as  the  trees  and  poles,  are  far  away  from  the  car,  especially  in  the  6th  to  13th  frames. 
For  the  first  few  frames,  the  features  of  the  roadside  behind  the  car  mix  with  the  features  of  the  car  and 
affect  the  performance  of  DXT-ME  but  then  the  car  moves  away  from  the  roadside  towards  the  camera 
so  that  the  car  features  are  isolated  from  the  background  features  and  so  DXT-ME  can  estimate  motion 
more  accurately.  As  to  the  frame  differentiated  images  as  shown  in  Fig.  15(c),  the  residual  energy  of  the 
moving  car  is  completely  separated  from  the  rest  of  the  scene  in  most  of  the  preprocessed  frames  and, 
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(a)  Original  (b)  Edge  extracted  (c)  Frame  differentiated 

Fig.  15.  Sequence  ’’Infrared  Car”  (CAR) 


therefore,  lower  MSE  values  are  obtained  with  this  preprocessing  function  than  with  edge  extraction. 
As  can  be  seen  in  Fig.  16(a),  the  MSE  value  of  SE-DXT-ME  of  block  size  32  at  Frame  9  is  smaller 
than  that  of  BMA-ME  of  search  area  32.  In  other  words,  SE-DXT-ME  is  better  than  BMA-ME  in  this 
particular  case.  However,  the  results  of  Overlapping  SE-DXT-ME  in  Fig.  16(b)  show  little  improvement 
over  SE-DXT-ME  with  fixed  $  =  {0, . . . ,  1V/2}2,  in  contrary  to  the  large  gain  in  the  performance  of 
Overlapping  SE-DXT-ME  over  SE-DXT-ME  on  the  small  “Flower  Garden”  sequence. 

Simulation  is  also  performed  on  the  “Miss  America”  sequence,  in  which  a  lady  is  talking  to  the  camera. 
Each  frame  has  352  x  288  pixels.  This  sequence  has  only  little  translational  motion  of  the  head  and 
shoulders  but  mainly  the  mouth  and  eyes  open  and  close.  This  makes  the  task  of  motion  estimation 
difficult  for  this  sequence  but  the  DXT-ME  algorithm  can  still  perform  reasonably  well  compared  to  the 
BMA-ME  method,  as  can  be  found  in  Fig.  17.  Especially  for  the  Overlapping  SE-DXT-ME  algorithm 
with  block  size  32,  the  MSE  values  are  very  close  to  those  of  BMA-ME  of  different  block  size  as  shown 
in  Fig.  17(b). 

The  last  sequence  is  the  small  “Table  Tennis”  which  has  the  frame  size  352  x  224  pixels.  In  this 
sequence,  the  first  twenty  three  frames  contain  solely  a  bouncing  ball  with  rough  texture  of  the  wall 
behind.  Thus,  the  edge  extracted  frames  capture  a  lot  of  background  features  mixed  with  the  ball 
features.  This  influences  negatively  the  estimation  of  DXT-ME.  However,  the  frame  differentiated 
images  have  no  such  background  features  and  as  a  result  Overlapping  SE-DXT-ME  of  block  size  32 
(DXT32:64)  with  frame  differences  as  input  performs  much  better  than  Overlapping  SE-DXT-ME  after 
edge  extraction  (see  Fig.  18(b))  even  though  the  ball  is  not  moving  in  a  constant  speed  and  its  residual 
energy  after  frame  subtraction  is  weak.  After  the  23rd  frame,  the  camera  is  zooming  out  quickly,  making 
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SE-DXT-ME  after  edge  detection  SE-DXT-ME  after  frame  differentiation 


(a)  Comparison  of  SE-DXT-ME  with  BMA-ME 


INFRARED  CAR 


INFRARED  CAR 


Overlapping  SE-DXT-ME  after  edge  detection 


Overlapping  SE-DXT-ME  after  frame  differentiation 


(b)  Comparison  of  Overlapping  SE-DXT-ME  with  BMA-ME 


Fig.  16.  Comparison  of  DXT-ME  with  BMA-ME  on  “Infrared  Car” 


no  method  predict  well.  Then  the  zooming  action  slows  down.  In  this  situation,  the  MSE  values  of 
SE-DXT-ME  go  down  suddenly  to  as  low  as  those  of  BMA-ME  in  Fig.  18(a). 


VI.  Conclusion 

In  this  paper,  we  presented  the  new  motion  estimation  algorithm  DXT-ME  that  computes  the  DCT 
pseudo-phases  of  images  and  employs  the  sinusoidal  orthogonal  principles  to  estimate  motions  in  the 
transform  domain.  In  this  way,  it  can  be  incorporated  into  codecs  of  various  image  compression  protocols 
like  MPEG,  CCITT  H261,  etc.  and  enables  us  to  utilizes  the  advancement  of  DCT  codecs  which  is 
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SE-DXT-ME  after  edge  detection 


SE-DXT-ME  after  frame  differentiation 


(a)  Comparison  of  SE-DXT-ME  with  BMA-ME 


MISS  AMERICA 

Overlapping  SE-DXT-ME  alter  edge  detection 


MISS  AMERICA 

Overlapping  SE-DXT-ME  after  frame  differentiation 


(b)  Comparison  of  Overlapping  SE-DXT-ME  with  BMA-ME 

Fig.  17.  Comparison  of  DXT-ME  with  BMA-ME  on  “Miss  America” 

under  extensive  research.  In  addition,  motion  estimation  in  the  DCT  domain  enables  us  to  remove  the 
IDCT  component  in  the  loop  and  moves  the  DCT  component  out  of  the  loop  as  explained  in  Section  I. 
This  not  only  reduces  the  coder  complexity  but  at  the  same  time  increases  the  system  throughput  [22] . 
Furthermore,  it  requires  much  less  computational  complexity  0(N 2)  as  compared  to  0(iV4)  for  BMA- 
ME  and  has  very  good  noise  immunity.  Due  to  its  feature  matching  property,  it  is  possible  to  use  simple 
preprocessing  to  enhance  the  features  of  moving  objects  in  order  to  further  improve  the  performance  of 
DXT-ME,  which  is  demonstrated  to  be  comparably  as  good  as  that  of  BMA-ME  in  terms  of  MSE  values 
according  to  the  simulation  results  on  a  number  of  video  sequences  with  different  visual  characteristics. 
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(a)  Comparison  of  SE-DXT-ME  with  BMA-ME 
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Overlapping  SE-DXT-ME  after  edge  detection  Overlapping  SE-DXT-ME  after  frame  differentiation 


(b)  Comparison  of  Overlapping  SE-DXT-ME  with  BMA-ME 


Fig.  18.  Comparison  of  DXT-ME  with  BMA-ME  on  small  “Table  Tennis” 


Finally,  this  DXT-ME  algorithm  has  inherently  highly  parallel  operations  and  as  a  result  it  can  easily 
be  implemented  on  highly  parallel  array  processors  or  dedicated  circuits. 
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